320 research outputs found

    Approximating the double-cut-and-join distance between unsigned genomes

    Get PDF
    In this paper we study the problem of sorting unsigned genomes by double-cut-and-join operations, where genomes allow a mix of linear and circular chromosomes to be present. First, we formulate an equivalent optimization problem, called maximum cycle/path decomposition, which is aimed at finding a largest collection of edge-disjoint cycles/AA-paths/AB-paths in a breakpoint graph. Then, we show that the problem of finding a largest collection of edge-disjoint cycles/AA-paths/AB-paths of length no more than l can be reduced to the well-known degree-bounded k-set packing problem with k = 2l. Finally, a polynomial-time approximation algorithm for the problem of sorting unsigned genomes by double-cut-and-join operations is devised, which achieves the approximation ratio for any positive ε. For the restricted variation where each genome contains only one linear chromosome, the approximation ratio can be further improved t

    The Fibers and Range of Reduction Graphs in Ciliates

    Full text link
    The biological process of gene assembly has been modeled based on three types of string rewriting rules, called string pointer rules, defined on so-called legal strings. It has been shown that reduction graphs, graphs that are based on the notion of breakpoint graph in the theory of sorting by reversal, for legal strings provide valuable insights into the gene assembly process. We characterize which legal strings obtain the same reduction graph (up to isomorphism), and moreover we characterize which graphs are (isomorphic to) reduction graphs.Comment: 24 pages, 13 figure

    CTCF binding site classes exhibit distinct evolutionary, genomic, epigenomic and transcriptomic features

    Get PDF
    CTCF DNA binding sites are classified into distinct functional classes, with distinct biological properties, shedding light on the differing functional roles of CTCF binding

    An asymmetric approach to preserve common intervals while sorting by reversals

    Get PDF
    Dias Vieira Braga M, Gautier C, Sagot M-F. An asymmetric approach to preserve common intervals while sorting by reversals. Algorithms for Molecular Biology. 2009;4(1):16.Background: The reversal distance and optimal sequences of reversals to transform a genome into another are useful tools to analyse evolutionary scenarios. However, the number of sequences is huge and some additional criteria should be used to obtain a more accurate analysis. One strategy is searching for sequences that respect constraints, such as the common intervals (clusters of co-localised genes). Another approach is to explore the whole space of sorting sequences, eventually grouping them into classes of equivalence. Recently both strategies started to be put together, to restrain the space to the sequences that respect constraints. In particular an algorithm has been proposed to list classes whose sorting sequences do not break the common intervals detected between the two inital genomes A and B. This approach may reduce the space of sequences and is symmetric (the result of the analysis sorting A into B can be obtained from the analysis sorting B into A). Results: We propose an alternative approach to restrain the space of sorting sequences, using progressive instead of initial detection of common intervals (the list of common intervals is updated after applying each reversal). This may reduce the space of sequences even more, but is shown to be asymmetric. Conclusions: We suggest that our method may be more realistic when the relation ancestor-descendant between the analysed genomes is clear and we apply it to do a better characterisation of the evolutionary scenario of the bacterium Rickettsia felis with respect to one of its ancestors

    Bayesian Integration of Genetics and Epigenetics Detects Causal Regulatory SNPs Underlying Expression Variability

    Get PDF
    The standard expression quantitative trait loci (eQTL) detects polymorphisms associated with gene expression without revealing causality. We introduce a coupled Bayesian regression approach—eQTeL, which leverages epigenetic data to estimate regulatory and gene interaction potential, and identifies combination of regulatory single-nucleotide polymorphisms (SNPs) that explain the gene expression variance. On human heart data, eQTeL not only explains a significantly greater proportion of expression variance but also predicts gene expression more accurately than other methods. Based on realistic simulated data, we demonstrate that eQTeL accurately detects causal regulatory SNPs, including those with small effect sizes. Using various functional data, we show that SNPs detected by eQTeL are enriched for allele-specific protein binding and histone modifications, which potentially disrupt binding of core cardiac transcription factors and are spatially proximal to their target. eQTeL SNPs capture a substantial proportion of genetic determinants of expression variance and we estimate that 58% of these SNPs are putatively causal

    A Unifying Model of Genome Evolution Under Parsimony

    Get PDF
    We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph GG, a finite set of AVGs describe all parsimonious interpretations of GG, and this set can be explored with a few sampling moves.Comment: 52 pages, 24 figure

    Genomic distance under gene substitutions

    Get PDF
    Dias Vieira Braga M, Machado R, Ribeiro LC, Stoye J. Genomic distance under gene substitutions. BMC Bioinformatics. 2011;12(Suppl 9: Proc. of RECOMB-CG 2011): S8.Background: The distance between two genomes is often computed by comparing only the common markers between them. Some approaches are also able to deal with non-common markers, allowing the insertion or the deletion of such markers. In these models, a deletion and a subsequent insertion that occur at the same position of the genome count for two sorting steps. Results: Here we propose a new model that sorts non-common markers with substitutions, which are more powerful operations that comprehend insertions and deletions. A deletion and an insertion that occur at the same position of the genome can be modeled as a substitution, counting for a single sorting step. Conclusions: Comparing genomes with unequal content, but without duplicated markers, we give a linear time algorithm to compute the genomic distance considering substitutions and double-cut-and-join (DCJ) operations. This model provides a parsimonious genomic distance to handle genomes free of duplicated markers, that is in practice a lower bound to the real genomic distances. The method could also be used to refine orthology assignments, since in some cases a substitution could actually correspond to an unannotated orthology

    Identification and Functional Characterization of Gene Components of Type VI Secretion System in Bacterial Genomes

    Get PDF
    A new secretion system, called the Type VI Secretion system (T6SS), was recently reported in Vibrio cholerae, Pseudomonas aeruginosa and Burkholderia mallei. A total of 18 genes have been identified to be belonging to this secretion system in V. cholerae. Here we attempt to identify presence of T6SS in other bacterial genomes. This includes identification of orthologous sequences, conserved motifs, domains, families, 3D folds, genomic islands containing T6SS components, phylogenetic profiles and protein-protein association of these components. Our analysis indicates presence of T6SS in 42 bacteria and its absence in most of their non-pathogenic species, suggesting the role of T6SS in imparting pathogenicity to an organism. Analysis of genomic regions containing T6SS components, phylogenetic profiles and protein-protein association of T6SS components indicate few additional genes which could be involved in this secretion system. Based on our studies, functional annotations were assigned to most of the components. Except one of the genes, we could group all the other genes of T6SS into those belonging to the puncturing device, and those located in the outer membrane, transmembrane and inner membrane. Based on our analysis, we have proposed a model of T6SS and have compared the same with the other bacterial secretion systems
    corecore